REMOOV: A Tool for Online Handling of Out-of-Vocabulary Words in Machine Translation

نویسنده

  • Nizar Habash
چکیده

REMOOV is a tool for online handling of out-of-vocabulary (OOV) words in statistical machine translation. REMOOV employs four techniques. Spelling expansion and morphological expansion are used to produce alternative in-vocabulary (INV) forms of OOV words. Dictionary term expansion and proper name transliteration produce target translations directly. These techniques can be used to expand the phrase table utilized in decoding or as part of an input/output lattice expansion. Results of using REMOOV show a consistent improvement over a state-of-the-art baseline. This paper describes the different components and parameters of the REMOOV tool.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation

We present four techniques for online handling of Out-of-Vocabulary words in Phrasebased Statistical Machine Translation. The techniques use spelling expansion, morphological expansion, dictionary term expansion and proper name transliteration to reuse or extend a phrase table. We compare the performance of these techniques and combine them. Our results show a consistent improvement over a stat...

متن کامل

Exploiting Parallel Corpus for Handling Out-of-Vocabulary Words

This paper presents a hybrid model for handling out-of-vocabulary words in Japaneseto-English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we treat these scripts differently. A machine transliteration model is built to transliterate out-ofvocabulary Japanese k...

متن کامل

Handling of Out-of-vocabulary Words in Japanese-English Machine Translation by Exploiting Parallel Corpus

A large number of loanwords and orthographic variants in Japanese pose a challenge for machine translation. In this article, we present a hybrid model for handling out-of-vocabulary words in Japanese-to-English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we t...

متن کامل

Images as Context in Statistical Machine Translation∗

This paper reports ongoing experiments towards exploiting the use of images to provide additional context for statistical machine translation (SMT). We investigate whether this contextual information can be helpful in targeting two well-known challenges in machine translation: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslat...

متن کامل

Handling OOV Words in Dialectal Arabic to English Machine Translation

Dialects and standard forms of a language typically share a set of cognates that could bear the same meaning in both varieties or only be shared homographs but serve as faux amis. Moreover, there are words that are used exclusively in the dialect or the standard variety. Both phenomena, faux amis and exclusive vocabulary, are considered out of vocabulary (OOV) phenomena. In this paper, we prese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009